We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first.
We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first.
library(readr) death = read_csv( "http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv") death[1:2, 1:5]
# A tibble: 2 x 5
X1 `1760` `1761` `1762` `1763`
<chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan NA NA NA NA
2 Albania NA NA NA NA
colnames(death)[1] = "country" death[1:2, 1:5]
# A tibble: 2 x 5
country `1760` `1761` `1762` `1763`
<chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan NA NA NA NA
2 Albania NA NA NA NA
library(dplyr) sweden = death %>% filter(country == "Sweden") %>% select(-country) year = as.numeric(colnames(sweden)) plot(as.numeric(sweden) ~ year)
Set within most plots in the base 'graphics' package:
The y-axis label isn't informative, and we can change the label of the y-axis using ylab (xlab for x), and main for the main title/label.
plot(as.numeric(sweden) ~ year,
ylab = "# of deaths per family", main = "Sweden", type = "l")
Let's drop any of the projections and keep it to year 2012, and change the points to blue.
plot(as.numeric(sweden) ~ year,
ylab = "# of deaths per family", main = "Sweden",
xlim = c(1760,2012), pch = 19, cex=1.2,col="blue")
You can also use the subset argument in the plot() function, only when using formula notation:
plot(as.numeric(sweden) ~ year,
ylab = "# of deaths per family", main = "Sweden",
subset = year < 2015, pch = 19, cex=1.2,col="blue")
After reshaping the data to long, we can plot the data with one data.frame:
library(tidyr) long = gather(death, key = year, value = deaths, -country) long = long %>% filter(!is.na(deaths)) head(long)
# A tibble: 6 x 3
country year deaths
<chr> <chr> <dbl>
1 Sweden 1760 2.207555
2 United Kingdom 1760 2.195995
3 Sweden 1761 2.300089
4 United Kingdom 1761 2.347105
5 Sweden 1762 2.785200
6 United Kingdom 1762 2.320127
class(long$year)
[1] "character"
long$year = as.numeric(long$year)
swede_long = long %>% filter(country == "Sweden") plot(deaths ~ year, data = swede_long)
ggplot2 is a package of plotting that is very popular and powerful (using the grammar of graphics). qplot ("quick plot"), similar to plot
library(ggplot2) qplot(x = year, y = deaths, data = swede_long)
The generic plotting function is ggplot, which uses aesthetics:
ggplot(data, aes(args))
g = ggplot(data = swede_long, aes(x = year, y = deaths))
g is an object, which you can adapt into multiple plots!
Common aesthetics:
If you set these in aes, you set them to a variable. If you want to set them for all values, set them in a geom.
geom?g on it's own can't be plotted, we have to add layers, usually with geom_ commands:
geom_point - add pointsgeom_line - add linesgeom_density - add a density plotgeom_histogram - add a histogramgeom_smooth - add a smoothergeom_boxplot - add a boxplotsgeom_bar - bar chartsgeom_tile - rectangles/heatmapsYou "add" things to a plot with a + sign (not pipe!). If you assign a plot to an object, you must call print to print it.
gpoints = g + geom_point(); print(gpoints) # one line for slides
Otherwise it prints by default - this time it's a line
g + geom_line()
You can add multiple geoms:
g + geom_line() + geom_point()
Let's add a smoother through the points:
g + geom_line() + geom_smooth()
If we want a plot with new data, call ggplot again. Group plots by country using colour:
sub = long %>% filter(country %in%
c("United States", "United Kingdom", "Sweden",
"Afghanistan", "Rwanda"))
g = ggplot(sub, aes(x = year, y = deaths, colour = country))
g + geom_line()
Let's remove the legend using the guide command:
g + geom_line() + guides(colour = FALSE)
ggplot(long, aes(x = year, y = deaths)) + geom_boxplot()
For different plotting per year - must make it a factor - but x-axis is wrong!
ggplot(long, aes(x = factor(year), y = deaths)) + geom_boxplot()
geom_jitter plots points "jittered" with noise so not overlappingsub_year = long %>% filter( year > 1995 & year <= 2000) ggplot(sub_year, aes(x = factor(year), y = deaths)) + geom_boxplot(outlier.shape = NA) + # don't show outliers - will below geom_jitter(height = 0)
A facet will make a plot over variables, keeping axes the same (out can change that):
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country)
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country, ncol = 1)
You can also do multiple factors with + on the right hand side
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country + x2 + ... )
xlab/ylab - functions to change the labels; ggtitle - change the titleq = qplot(x = year, y = deaths, colour = country, data = sub,
geom = "line") +
xlab("Year of Collection") + ylab("Deaths /100,000") +
ggtitle("Mortality of Children over the years",
subtitle = "not great")
q
?theme_bw - for ggthemesq + theme_bw()
theme - global or specific elements/increase text sizeq + theme(text = element_text(size = 12), title = element_text(size = 20))
q = q + theme(axis.text = element_text(size = 14),
title = element_text(size = 20),
axis.title = element_text(size = 16),
legend.position = c(0.9, 0.8)) +
guides(colour = guide_legend(title = "Country"))
q
transparent_legend = theme(legend.background = element_rect(
fill = "transparent"),
legend.key = element_rect(fill = "transparent",
color = "transparent") )
q + transparent_legend
We can do histograms again using hist. Let's do histograms of death rates over the years:
hist(sub$deaths, breaks = 200)
qplot(x = deaths, fill = factor(country),
data = sub, geom = c("histogram"))
Alpha refers to the opacity of the color, less is more opaque
qplot(x = deaths, fill = country, data = sub,
geom = c("histogram"), alpha=.7)
We cold also do densities:
qplot(x= deaths, fill = country, data = sub,
geom = c("density"), alpha= .7)
colour not fill:qplot(x = deaths, colour = country, data = sub,
geom = c("density"), alpha= .7)
You can take off the lines of the bottom like this
ggplot(aes(x = deaths, colour = country), data = sub) + geom_line(stat = "density")
qplot(x = year, y = deaths, colour = country,
data = long, geom = "line") + guides(colour = FALSE)